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ABSTRACT 



A program for monitoring computer system performance 
includes a collection of source code modules in the form of 
a high level language. Each of the source code modules is 
compiled into a corresponding object code module. The 
object code modules are assembled into machine dependent 
code. The machine code is translated into a program module 
in the form of a machine independent register translation 
language. The program module is partitioned into basic 
program components. The basic program components 
include procedures, basic blocks within procedures, and 
instructions within basic blocks. Fundamental instrumenta- 
tion routines identify, locate, and modify specific program 
components to be monitored. The modified basic program 
components are converted to an instrumented machine 
executable code to be executed in the computer system so 
that performance data can be collected while the program is 
executing in the computer. 

15 Claims, 8 Drawing Sheets 
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SYSTEM FOR MONITORING COMPUTER Most monitoring systems which count basic blocks accu- 

SYSTEM PERFORMANCE mulate counts for all the blocks of the program. Other than 

by tedious modifications, it usually is not possible to monitor 

This is a divisional of application Sen No. 08/778,648, selected blocks of interest, 

filed on Jan. 3, 1997, now U.S. Pat. No. 5,732,273, which is 5 Most known tracing systems gather detailed address data 

a continuation of application Ser. No. 08/5 14,020, filed Aug. inefficiently. A typical trace for a small program can include 

11, 1995, now abandoned, which is a continuation-in-part of gigabytes of trace data. A user interested in monitoring just 

application Ser. No. 08/204,834, filed on Mar. 1, 1994 now the branch behavior of a program has to sift through entire 

U.S. Pat. No. 5,539,907. traces just to find, for example, conditional branch instruc- 

30 tions. 

HELD OF THE INVENTION Simulating the execution of programs at the instruction 

The present invention relates generally to computer level can consume enormous quantities of system resources, 

systems, and more particularly to a method and apparatus for In addition, it is extremely difficult to accurately simulate the 

monitoring the performance of computer systems by instru- hardware and software behavior of a complex computer 

menting programs. 15 svste m. Simulated performance data does not always reli- 
ably reflect real run-time performance. 

BACKGROUND OF THE INVENTION There also are problems with the means used to commu- 

& t c u • * r nicate performance data. Most systems use expensive inter- 
Systems for monitoring computer system performance are r , A . 

, i • * * * u jx, j af ■ processor data communications channels to communicate 

extremely important to hardware and software engineers. 20 « . » . • . i 

y t i J . , , .j* - t_ performance data. Inter-processor communication channels 

Hardware engineers need systems to determine how new r n • i= • j 

i i t c are generally memcient and may disturb the processing 

computer hardware architectures perform with existing . & /, . . , e J , ( * 

r ,. . , ,. 0 environment being monitored. Some systems make difficult 

operating systems and application programs. Specific ,. c 4 . , *» 4 . 4 , • iL «= 

j . c i j * *. u j ■ modifications to the operating system to improve the effi- 

designs of hardware structures, such as memory and cache, c .... ^ ^ ~, 

f i . * ii . , J j- . u , ciency of monito rmg computer systems. Furthermore, unfil- 

can have drastically different, and sometimes unpredictable 25 , j c .« * rj-i 

r . J t*- • * tered performance data can consume large quantities of disk 

utilizations for the same set of programs. It is important that * B H 

any flaws in the hardware architecture be identified before S °ff^ e s P ace " 

the hardware design is finalized. 1S a need for a flexible and efficient monitoring 

0 - . i . -i .-r t r system which can easily be adapted to a diverse set of 

Software engineers need to identify critical portions of ■ . , . , [ ■ L1 , 4 . 

_ & . .. , J - , - n monitoring tasks, ranging from basic block counting to 

programs. For example, compiler writers would like to find 30 • u 4 - r *• ■ * *■ Jt . ul 

v ° . -, . , i • e . measunng cache utihzation. The information data should be 

out how the compiler schedules instructions for execution, • j n ♦ *u . i *• e *u 

, ii r i . i precise, and reflect the actual operation of the computer 

or how well the execution of conditional branches are r 

predicted to provide input for code optimization. y 

It is a problem to accurately monitor hardware and SUMMARY OF THE INVENTION 
software systems performance. Known systems typically are 35 ^ invention avoids the above and other problems of 
hand crafted. Costly hardware and software modifications known performance monitoring systems and methods, and 
may need to be implemented to ensure that system opera- satisfies the foregoing described needs. In accordance with 
tions are not affected by the monitoring systems. the invention, a system for monitoring the performance of a 
Many monitoring systems are known for different hard- 4Q computer system, while executing a program, instruments 
ware and software environments. One class of systems the machine dependent executable code prior to execution, 
simply counts the number of times each basic block of The instrumentation places user analysis routines in the 
machine executable instructions is executed. A basic block executable code. The user analysis routines are used to 
is a group of instructions where all the instructions of the collect performance data while the program is executing, 
group are executed if the first instruction of the group is ^ xhe machine executable code can be created as a collec- 
executed. The counts can be studied to identify critical tio n 0 f source code modules in the form of a high level 
portions of the program. language by an editor. Each of the source code modules is 
Monitoring references to instructions and data addresses compiled into a corresponding object code module. The 
are usually performed by tracing systems. Data address object code modules are linked and assembled into machine 
traces can be used to improve the design of caches, and 50 dependent executable code. The machine dependent execut- 
increase the efficiency of in-memory data structures. Instruc- able code is translated into an intermediate program module 
tion address traces can identify unanticipated execution in the form of a machine independent register translation 
paths. language (RTL), All addresses referenced in the RTL inter- 
In another class of systems, the simulated operation of the mediate program module are maintained in a logical symbol 
computer system is monitored. Simulators attempt to mimic 55 table. 

the behavior of computer systems without actually executing The program module is organized and partitioned into 

software in real time. basic program components. The basic program components 

There are problems with traditional monitoring systems. include procedures, basic blocks within procedures, and 

Most systems monitor a limited number of specific system instructions within basic blocks. Procedures include instruc- 

characteristics, for example, executed instructions or refer- 60 tions which are related for execution, instructions within 

enced data. It is difficult for users to modify such systems for each procedure are grouped into basic blocks. A basic block 

other purposes. Building specialized systems is not a viable is a group of instructions where all of the instructions of the 

solution since the number of system characteristics to be group are executed if the first instruction of the group is 

monitored is large and variable. If the performance data executed. 

supplied by the monitoring system is less than what is 65 Fundamental instrumentation routines are provided to 

desired the system is of limited use. If the system supplies identify and locate specific program components to be 

too much performance data, the system is inefficient. monitored during execution, the identified specific program 
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components are instrumented to insert call instructions to known as the "binary image" of the program. In essence, the 

user analysis routines. The user analysis routines are com- CPU 2 executes the machine executable code or binary 

bined with the instrumented procedures. The procedures are image as is, without any further processing. In other words, 

converted to instrumented machine dependent executable the machine executable code, is the expressing of the source 

code for execution. 5 program that directly manipulates the hardware to perform 

the program process steps. 

BRIEF DESCRIPTION OF THE DRAWINGS It is desirable to monitor the performance of the computer 

system 1 while it is operating. Performance data can be used 
FIG. 1 is a block diagram of a computer system which can l0 optimize the design of the hardware and software com- 
be monitored according to the principles of the invention; ponents of the system. Accurate performance data depends 
FIG. 2 is a top level flow diagram of a monitoring system; on monitoring the computer system 1 without substantially 
FIG. 3 is a detailed flow diagram of the monitoring disturbing the processing environment, 
system; ^ or me P ur P 0Se of monitoring, programs to execute in the 
rrj/i * a • a j * r .* r • * computer system 1 can be viewed as a linear collection of 
FIG. 4 is a flow diagram of routines for instrumenting a ^ pT0 £ dmes > lbe procedures mcluding basic b i ocks , and , he 

program, basic blocks including instructions. For the purpose of this 

FIG. 5 is a flow diagram of a program showing calling description, the program can exist in a number of different 

edges; forms, including the source program code created by a user 

FIG. 6 is a flow diagram of an instrumentation procedure with an editor, object code generated by a compiler, linked 

for branch analysis; and 20 code from a linker, machine executable code or a binary 

FIGS. 7 and 8 are block diagram of allocated memory for image, generated by a code generator such as an assembler, 

machine code before and after instrumentation. The binarv image can be directly processed by the CPU 2 of 

the computer system 1. In the preferred embodiment of the 

DETAILED DESCRIPTION OF A PREFERRED invention, a monitor modifies the binary code to produce 

EMBODIMENT 25 instrumented machine executable code which can be used to 

- . . , monitor the performance of the computer system 1. 

FIG. 1 shows a computer system 1 to be monitored . . , . , ... <-* L 

„ . . , . . It is desirable to monitor the execution of the program at 

according to the principles of the invention. The computer 4t _ , , . L1 , . . , V i 

f . , j r , , . /o nT i\> the procedure, basic block, and instruction level. For 

system 1 includes a central processing unit (CPU) 2, a r , ' . * c n * i_* 

J - , t/^^ ij. u .u i. example, monitoring the execution of the first machine 

memory 3, and an I/O 4 coupled to each other by a \ U1 . 4 t . & f . . . , . . , , , , , 

. \. . _ t , , - n executable instruction of basic blocks yields block counts, 

communications bus 5. The computer system 1 can be a 3D . . /. 

. t i . *• c c Monitoring machine executable instructions which load and 

personal computer, a work-station, a main-frame, or part of , , . „ , , « 

r . i c < * • * store data in the memory allows the user to study cache 

a network of computers executing process steps 4 . ™ «. . f . . j- *• i_ 

. , , , 11 i utilization. The efficiency of branch prediction can be esti- 

m epen en y, or in para e . mated by monitoring conditional branch instructions at the 

With respect to the component structures of the computer en ^ Q f bio^ 

system 1, the CPU 2 may be of a variety of arcm^ecmres, * 5 mQ 2 {& &n overvicw of a s tem which can be used for 

such as complex or reduced mstrucUons computing (CISC, monitori the performance of the computer system 1 of 

RISC) and the like. The CTU 2 can include general purpose piG. 1. Programmers or users, typically, via the I/O 4, define 

and dedicated registers 8. The registers 8 are for temporally a ^ Qr Q 20 in a hi D . level machine . 

storing data acquired from the memory 3. ^ mdependent language using „ editor 10 , ^ high . Ievel 

The memory 3 can mclude a cache 7 to accelerate the data language can be C, Cobol, Fortran, ADA, Pascal, and the 

flow between the CPU 2 and the memory 3. The cache 7 can like. If the program 20 is very large, the program 20 is 

include specialized data and instruction caches. The struc- created as a library or collection of smaller program 

ture of the bus 5 is general, and can include dedicated segments, usually called source code modules 21-23. 

high-frequency data paths for communicating data, for 45 A 30 translates the high-level language of the 

example between the cache 7 and the CPU 2. The I/O 4 can p ro g r a m 20 to object code 40 stored in a collection of object 

include input and output interfaces for acquiring and dis- mo dules 41^13. Usually, there is a one-to-one correspon- 

tributing data - dence between the source modules 21-23 and the object 

During operation of the computer system 1, data and modules 41-43. The object modules 41-43 are associated 

processing steps are acquired by the CPU 2, usually from the 50 with corresponding relocation tables and symbol tables 

memory 3 or cache 7 via the bus 5. The processing steps are (RST) 44-46. The RSTs 44-46 are used to resolve logical 

in the form of a sequence of machine executable code, e.g., addresses when the source program is converted to execut- 

instructions. The data and the machine executable instruc- able form. Frequently used portions of the program can be 

tions are usually permanently retained by the I/O 4, and retrieved from a library of object modules, 

loaded into the memory 3 as needed. The machine execut- 5S a linker and/or assembler 45 can combine the object 

able instructions process the data, and store the processed modules 41-43 into a machine dependent executable code 

data back in the memory 3. go. The linker 45 may include pre-compiled object code 

The machine executable instructions are tightly coupled modules from a library of object modules 49. Typically, the 

to the architecture of the hardware components. The hard- machine executable code 60 is loaded into the memory 3 for 

ware architecture specifying word and byte sizes, bus widths go execution by the CPU 2. 

and structures, instruction codes, register naming The machine dependent executable code 60 can be in the 

conventions, memory addressing schemes, input/output form of the Portable Executable (PE) file format as used in 

control, interrupt schemes, timing restrictions, and the like. the Microsoft Corporation's Win 32 operating systems, such 

The machine executable instructions include operation as Windows NT and Windows 95. The PE file format is 

codes (opcodes) and operand specifiers which are specifi- 65 organized as a linear stream of data including "MZ" headers, 

cally encoded for the hardware that executes the instruc- section bodies, and closing out with relocation information, 

tions. The machine executable code is sometimes also symbol table information, and line number information. 
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The executable code 60 is stored in a file section, ".text". 
The data definitions for the text sections are stored in data 
sections, ".bss, .rdata, and .data". The .bss section represents 
uninitialized data. The .rdata section represents read-only 
data, such as literal strings, constants, and debugging infer- 5 
mation. All other variables, other than temporary variables 
created in a memory stack during execution, are stored in the 
.data section. Resource information is stored in a .rsrc 
section. For a more complete description of the PE file 
format please see The Portable Executable File Format from i o 
Top to Bottom, Randy Kath, Microsoft Corporation, 1993, 
and Peering Inside the PE: A Tour of the Win 32 Portable 
Executable File Format, Matt Pietrek, Microsoft Systems 
Journal, 1994. 

Executable code 60 for other operating systems and is 
hardware architectures have similar relocatable formats, see 
for example, OSF/1 Alpha AXP Assembly Language Guide, 
Digital Equipment Corporation, for OSF (Digital Unix) 
version 3.0, or higher. The interface for the relocations in the 
executable code is defined in a file "/iisr/include/cmplrs/ 20 
cmrlc.h." 

In a preferred embodiment of the invention, a monitor 50 
can be applied to the machine executable code 60 to identify 
specific portions of the program to be monitored for perfor- 
mance analysis while the program is executing in the CPU 25 
2. Directions on which portions of the program to monitor 
can be supplied by the user of the computer system 1. 

The monitor 50 can modify or "instrument" the machine 
dependent executable code 60 to enable the monitoring of 
the performance of the system 1. The modified code is 
converted to instrumented machine dependent executable 
code 60' which can be loaded in the memory 3 for subse- 
quent execution by the CPU 2 of the computer system 1 of 
FIG. 1. While executing the instrumented program, the CPU 
2 receives and provides user data 9A via the memory 3. The 
CPU 2 also generates performance data 9B stored in the 
memory 3, the performance data 9B can be retrieved via the 
I/O 4. 

It is a goal of the invention to facilitate the identification 4Q 
of portions of the program to be monitored, and to modify 
the program in such a way that the performance character- 
istics of the computer system 1 are not disturbed. As an 
advantage of the invention, the computer system that 
executes the unstrumented code can have a different hard- 45 
ware architecture than the target system for which the 
uninstrumented code was generated. 

FIG. 3 is a flow diagram of the monitoring system 
according to the preferred embodiment of the invention. The 
monitor 50 includes a translator 51, an organizer 54, an 50 
instrumentor 55, and a code generator 57. 

In the preferred embodiment of the invention, the monitor 
50 operates on the machine dependent executable code 60 to 
produce instrumented machine dependent executable code 
60'. In addition, as an advantage of the invention, the 55 
instrumented machine dependent executable code can be 
executed on a computer system that has a different hardware 
architecture than the system which is to executed the unin- 
strumented code 60. For example, the uninstrumented 
machine code 60 could be generated to execute in an Intel 60 
'4S6 CISC type of processor. The instrumented code 60* 
derived from the code 60 can be targeted for a high perfor- 
mance RISC machine, such as a 64-bit Alpha computer 
system, or vise versa. 

The translator 51 transforms the machine dependent code 65 
60 into an intermediate program module 52 in the form into 
a register translation language (RTL). The RTL is indepen- 
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dent of any one particular computer systems or hardware 
architecture. The specific architectures of any number of 
different target computer systems can be maintained in a 
CPU architectures description 19. 

This description 19 can be used to "parse" or disassemble 
the machine code 60. The CPU architecture description 19 
may include instruction operand and operator field 
specifications, instruction timings such as fetch latencies and 
pile-line delays of different architectures. The CPU archi- 
tecture description 19 can also describe cache, memory, and 
register characteristics, including addressing schemes for 
different classes of processors. The CPU architecture 
description 19 can also be subsequently used to generate the 
instrumented machine dependent executable code 60 f , per- 
haps for a different "output" target architecture than was 
intended for the uninstrumented machine code 60. 

The translator 51 also transforms all machine code oper- 
and addresses into symbolic addresses. The relocation tables 
and the symbol tables 44-46 are likewise transformed into 
a logical symbol table (LST) 53. The RTL program module 
52, the LST 53, and the CPU architecture description 19 can 
be stored in the memory 3. Translating the machine code 60 
into an intermediate form expedites the identification of 
portions of the program to be monitored, and also allows 
aggressive modification of the program 20 for monitoring 
purposes. 

The organizer 54 partitions the RTL intermediate program 
module 52 into a collection of procedures 100. Each of the 
procedures 101-103 includes instructions which are gener- 
ally related for execution. Furthermore, the procedures 
101-103 are organized into basic blocks 105. A basic block 
105 is a sequence of instructions which are all executed if 
the first instruction of the sequence is executed. The instruc- 
tions are a machine independent translation of the machine 
dependent operation codes and operands of the machine 
executable code 60. 

The organizer 54 also builds a procedure flow graph 
(PFG) 200 and a program call graph (PCG) 300 in the 
memory 3. The PFG 200 maps the flow of control through 
the basic blocks 105 of the procedures 101-103. The PCG 
300 indicates how the procedures 101-103 are called by 
each other. The monitor 50 can use the graphs 200 and 300 
to trace the execution flow through the program while the 
organized procedures 100 are examined. 

The instrumentor 55, under the direction of the user, 
identifies and modifies specific portions of the procedures 
100 to be monitored. The process of identifying and modi- 
fying portions of the procedures 100 for the purpose of 
performance monitoring is sometimes known as "instrumen- 
tation" or "instrumenting the code." The instrumentor 55 is 
described in further detail below, with reference to FIG. 4. 

The code generator 57 generates instrumented machine 
dependent machine executable code 60* for a target hard- 
ware architecture. As stated above, as an advantage of the 
invention, the target hardware for the instrumented machine 
code 60' can have a different architecture than the uninstru- 
mented code 60. 

Now for a more detailed description of the monitoring 
system according to the preferred embodiment of the inven- 
tion. 

The translator 51 converts the executable code 60 into the 
intermediate program module 52. The intermediate repre- 
sentation of the program is in the form of the register 
translation language and the logical symbol table 53. The 
RTL may be machine-independent, in the preferred embodi- 
ment of the invention, the RTL has been oriented for reduced 
instruction set computing (RISC) architectures. 
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The instructions of the RTL include generic operands such 
as load, store, jump, branch, and "operate." Only the load 
and store operands reference data in the memory 3. Proce- 
dure calls in the RTL are simple transfer of control 
instructions, for example "jump." Any parameters passed 5 
upon transfer of control are explicitly exposed. Conditional 
and unconditional transfer of control is accomplished with 
branch type of instructions. 

All arithmetic and logical instructions "operate" on data 
stored in the registers 8. Unlike traditional machine code, the io 
RTL assumes an expandable or infinite set of "virtual" 
registers, with some of the registers possibly having dedi- 
cated functions, for example, floating point arithmetic, a 
stack-pointer (SP) and a program counter (PC). 

All address references in the RTL program module 52 are 15 
symbolic. For example, addresses expressed relative to the 
PC are converted to targets which are labels in the logical 
symbol table 53. Similarly, all references, direct or 
displaced, to data stored in the memory 3 are converted to 
symbolic memory references. Converting all addresses to 20 
symbolic form has the advantage of enabling unconstrained 
modification of the instructions of the RTL program module 
52 to enable performance monitoring. 

The organizer 54 partitions the module 52 into portions 
which can be monitored. First, the module 52 is partitioned 25 
into the collection of procedures 100. Instructions in each of 
the procedures 101-103 are further grouped into basic 
blocks 105. Basic blocks 105 can be classified as either 
"normal" blocks or "control" blocks. A normal block merely ^ 
manipulates data, control blocks do not manipulate data, 
they alter the flow of execution. The basic blocks 105 
facilitate the tracing of the execution flow. 

The procedure flow graph 200 is built for each procedure, 
and the complete program call graph 300 is built for the 35 
entire program. These execution control structures are used 
during the subsequent instrumentation of the program. 

Organizing the RTL module 52 enables the annotation of 
the module 52 in a program description 56. The program 
description 56 can be stored in the memory 3. The program 40 
description 56 facilitates the manipulation of the procedures 
100 and also eases the identification of the fundamental 
organizational portions of the program by subsequent pro- 
cessing steps. The program description 56 can be incorpo- 
rated into the RTL module 52 as, for example, comment 45 
fields. 

The original structure of the source-level program 20 is 
recovered from the machine executable code 60 so that the 
monitor 50 can have as much knowledge of the program 
organization and control flow of the program as the compiler 50 
30 did. For example, a source-level case-statement is com- 
piled and assembled into machine code 60 as an indirect 
jump to an address from some location in a jump table index 
by the case index value. The jump table for case -statements 
is usually stored in a read-only data area with the addresses 55 
of different jump target location stored in successive loca- 
tions. This makes it possible to recognize case-statements in 
the machine code 60. The address of the jump table can be 
obtained by examining the case-statement object code. 

By identifying all of the case-statements in a program, the 60 
jump table can be partitioned into a set of branch tables of 
a known size. This in turn reveals all possible execution 
destinations. The execution destinations can be used to 
create the control graphs 200 and 300. 

The structure and operation of the instrumentor 55 is now 65 
described in greater detail with reference to FIG. 4. Instru- 
mentation of the program is a static process, e.g., the 
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program is not executing. Under the user's direction, por- 
tions of the procedures 100 to be monitored are identified by 
"navigating" through the procedures using the program and 
CPU description 56 and 19, and the graphs 200, 300. In the 
monitoring system according to the preferred embodiment 
of the invention, a number of fundamental procedures or 
routines are provided to navigate through the program. The 
instrumentor 55 also includes standard routines for modify- 
ing the program, for monitoring purposes. 

The instrumentor 55 has as input the program to be 
instrumented in the form of the organized procedures 100. 
The instrumentor 55 also uses the LST 53, the program 
description 56, the PFG 200, the PCG 300, and the CPU 
description 19. The instrumentor 55 also uses fundamental 
instrumentation routines (FIR) 47, user instrumentation rou- 
tines (UIR) 48, and user analyzer routines (UAR) 49. 

The FIR 47 are a set of basic procedures for identifying 
and modifying programs to be monitored. The user supplied 
UIR 48, in cooperation with the FIR 47 locate and modify 
specific portions of the program. The program is modified, 
in part, by inserting calls to the user analysis routines 49. 
The user supplied UAR 49 are procedures for collecting and 
analyzing performance data. 

The fundamental instrumentation routines (FIR) 47 can be 
incorporated as a standard component of the instrumentor 55 
when it is created. For example, the FIR 47 can be written 
in a high-level source language such as C. The source 
modules for the FIR 47 can be compiled and linked with the 
source modules of the monitor 50 using standard program- 
ming techniques. The user analysis routines (UAR) 49 can 
also be written in a high-level source language. The UAR 49 
can be part of the library of object modules 49 submitted to 
the monitor 50 along with the procedures 100 of the pro- 
grams to be monitored. 

The fundamental instrumentation routines 47 include 
navigational, operational, parsing, and modification routines 
(AHa-Ald). The navigational routines 47 a are used to 
traverse the static program to deduce structural information 
and execution flow information. The operational routines 
476 retrieve specific information about the organizational 
structured traversed by the navigational routines 47a. The 
parsing routines 47c are used to identify and parse identified 
instructions. The modification routines 47d change the pro- 
gram so that it may be monitored. 

Table 1 lists the navigational routines 47a of the FIR 47. 

TABLE 1 



Navigational Routines 



Procedure 


Block 


Instruction 


Edge 


GetFirstProc 


GetFirstBlock 


GetFirstlnst 


GetFirstSuccEdge 


GetLastProc 


GetLastBlock 


GetLastlnst 


GetNextSuccEdge 


GetNextProc 


GetNextBlock 


GetNextlnst 


GetFirstPredEdge 


GetPrevProc 


GetPrevBlock 


GetPrevInst 


GetNextPredEdge 


GetBlockProc 


GetlnstBlock 




GetEdgeTb 








GetEdgeFrom 



GetFirstProc returns as an output a pointer to the first 
procedure of the program that will be executed at run-time. 
The user can call this fundamental instrumentation routine to 
initiate static navigation through the program with a call 
from one of the UIR 48. Similarly, a call to GetLastProc 
returns as an output the pointer to the last procedure of the 
program. 

Given a procedure pointer, GetNextProc and GetPrevProc 
return pointers to the next and previous procedures, 
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respectively, unless there are no more procedures to 
navigate, in which case a "null" is returned. The GetBlock- 
Proc receives as an input a pointer to a block, and returns as 
an output a pointer to a parent procedure. The parent 
procedure is the one including the block. 5 

The navigational routines for block, and instruction per- 
form similar functions at the block and instruction level For 
example, Get Inst Block receives as an input a pointer to an 
instruction and returns as output a pointer to a parent block 
of the instruction. The procedures of the program, and the jq 
blocks of the procedure can be located by following the 
procedure flow graphs 200 and the program call graph 300. 

The "edge" routines are described with reference to FIG. 
5. FIG. 5 shows the procedure flow graph 200 and the 
program call graph 300 superimposed on the organized 15 
program 100. While executing, the interfaces or "edges" 
between the procedures 101-103 of the program are conju- 
gate pairs of basic blocks, for example, call blocks 151 and 
return blocks 152. The call block 151 is the last block 
executed in a procedure before execution control is trans- 20 
ferred to another procedure. Upon return, the return block 
152 is executed first. 

Each call block 151 has a single successor entry block 151 
in the called procedure. The entry block is executed first 
when a procedure is called by another procedure of the 25 
program. Each return block 152 has a single predecessor exit 
block 154. The exit block 154 is the last block executed 
before execution control is transferred to the calling proce- 
dure. An entry block may have many predecessor blocks, 
likewise for successors of the exit blocks 154. The normal 30 
blocks 155 are not involved in the interprocedural transfer of 
execution control, the normal blocks 155 define, use and 
consume variables and registers, for example variables x, y, 
and z. Following the "edges" of the program enables the 
tracing of the execution flow while the program is static. 35 

GetFirstSuccEdge and GetNextSuccEdge can be used to 
follow all possible execution paths from a specified block. 
Tracing the predecessor edges using GetFirstPredEdge and 
GetNextPredEdge locates all the possible execution paths 
that lead to the first instruction of the specified basic block. 40 
GetEdgeTo and GetEdgeFrom can be used to trace the 
execution of the program from edge to edge. 

Now continuing with description of the instrument or 55 
of FIG. 4, the user instrumentation routines (UIR) 48 are 
constructed by including calls to the fundamental instru- 45 
mentation routines 47 to find specific organizational struc- 
ture of the program and to trace the execution flow. For 
example, the user can locate the last instruction of every 
procedure, the first instruction of every block, and only 
instructions which call other procedures, to give but a few 50 
examples. 

The operational routines 47b are listed in Table 2. The 
operational routines provide information about portions of 
the program traversed by the navigational routines 47a. 

55 

TABLE 2 



Program 


Operational Routines 
Procedure 


Block 


GetProgramlnfo 


GetProcName 




GctProgramlnstArray 


GetFilcName 




GetProgramlnstCou nt 


GetProcPC 


GetBlockPC 


GctProgramName 


GetNa mcdProccdure 





60 



and the size of the memory space used by the program while 
executing. Storage space is required for the instructions or 
"text" of the program and for data manipulated by the 
program. Memory storage allocation will be described in 
greater detail with reference to FIGS. 7 and 8. 

GetProgramlnstArray returns a pointer to an identified set 
of instructions of the program. The identified instructions 
can be passed to the user analyzer routine 49 for collateral 
manipulation during execution. 

For example, the user analyzer routines 49 may want to 
collect dynamic address information from the instructions 
while the program is executing. 

GetProgramlnstCount returns the number of the instruc- 
tions in the set identified by the GetProgramlnstArray rou- 
tine. 

GetProgramName returns the name of the program. Get- 
NameProc receives as input a procedure name, and returns 
the pointer to the named procedure. 

GetProcName takes as an input a pointer to a procedure 
and returns as an output the name of the procedure. 
GetFileName, given the pointer to the procedure, returns the 
name of the file that is used to store the procedure. Get- 
ProcPC returns the run-time memory address of the proce- 
dure. GetNamedProc, given a procedure, name returns the 
pointer to the procedure. A null is returned if the named 
procedure does not occur in the program. GetBlockPC given 
a pointer to a block returns the memory address (PC) of the 
first instructions of the block. 

Table 3 lists the parsing routines 47c which can be used 
to identify and parse specific instructions. 



TABLE 3 




Parsing Routines 


Routine 


Value Returned 


IsInsOype 


instruction type 


Getlnstlnfo 


instruction information 


GetlnstRegEnum 


register type 


GetlnstRegUsage 


register usage 


GetlnstPC 


instruction memory address 


GetlnstBinary 


instruction binary code 


GetlnstClass 


instruction classification 


GetlnstProcCalled 


procedure called 



GetProgramlnfo returns static information about the pro- 
gram such as the starting memory address of the program, 



GetlnstType will return a logical true condition if the 
specified instruction is of a particular type. Different types of 
instructions include load, store, jump, branch, multiply, 
divide, floating point, etc. This routine can be used to 
determine if an instruction references memory or a register, 
and if execution flow is conditionally or unconditionally 
changed by, for example, branch instructions. 

Getlnstlnfo parses the instruction into fields such as: 
operation code (opcode) field, memory displacement field, 
branch displacement field, addressing mode field, register 
fields, and the like. 

To determine which of the registers are referenced by an 
instruction, one can use the GetlnstReg routine. GetlnsReg 
takes as input an instruction and an instruction type, and 
returns the type of registers used in the instruction, for 
example, integer, double precision, floating point, 
specialized, e.g., PC, SP etc. 

To determine the actual usage of registers the GetlnstRe- 
gUsage routine is used. For a given instruction, GetlnstRe- 
gUsage indicates which registers are simply read, and which 
registers are written. 

GetlnstPC returns the memory address or PC of the 
instruction. 
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GetlnstProcCalled returns the name of the procedure 
directly called by the instruction, presuming the instruction 
is either a branch or jump type of instruction. 

GetlnstBinary receives as in input a PC and returns the 
binary coded instruction stored at that memory address. This 
routine can be used to retrieve, for example, all the instruc- 
tions of the program. 

GetlnstClass returns the instruction class. The instruction 
classes define instructions for specific hardware architec- 
tures. The classes of instructions can be stored, for example, 
in the CPU description 19. Hardware specific classes can 
include integer and floating point load and stores, integer 
and floating point arithmetic, logical functions, e.g., AND, 
OR, XOR, shift functions, e.g., rotate, scale, etc. 

Table 4 lists the fundamental instrumentation routines that 
are available for modifying the program so that performance 
data can be collected. 

TABLE 4 



Modification routines 

AddCallProto 

AddCallProgram 

AddCallProc 

AddCallBlock 

AddCaMnst 

AddCallEdgc 



The program can be modified at a point before or after the 
identified portion so that during execution performance data 
can be collected before and after executing the identified 
portion. In the preferred embodiment of the invention, all 
communication between the modified program and the user 
analysis procedures 49 is by procedures calls. Procedure 
calls reduce processing time associated with the monitoring 
process. The identified portion can be the program, 
procedures, block, edge and instruction. 

Any data stored in registers used by the user analysis 
routines 49 must remain unchanged to preserve the execu- 
tion state of the modified program. The data can be saved in 
the stack before execution control is passed to the analysis 
procedures 49, and restored upon return. 

The routine AddCallProto is used to define what perfor- 
mance data are to be passed to the analysis routines during 
execution. Typical data to be passed may include static data, 
such as opcode, or dynamic data, for example, register or 
memory contents. It is also possible to pass computed values 
such as effective memory address, and a conditional value. 
The effective memory address should only be used with 
instructions which reference memory, i.e., load and store. 
The computed value is the base memory address plus the 
sign extended relative displacement The condition value 
should only be used when instrumenting branch instructions. 
If the branch is not taken, the value passed is zero. 

It is also possible to pass arrays to the analysis routines 49. 
The following sample user instrumentation routine, in the C 
language, creates a data structure that contains the program 
counters for each procedure, the data structure is passed as 
an argument to an "Open File" user analysis routine. 



Sample User Instrumentation Routine 

Lot number 

char prototype [100] 

number - 0 
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-continued 



Sample User Instrumentation Routine 

for (proc - GetFirst Proc( ) proc !- NULL, proc-GetNextProc(proc)) 
5 { 

number ++; 

} 

pcArray - Gong*) malloc(size of (long)* number); 
number - 0; 

for (proc - GetFirstProc( Xproc != NULL, proc - GetNextProc(proc)) 
10 { 

pcArray [number-H-] - ProcPC(proc); 

} 

sprintf (prototype, "OpcnFile(long[% dj)", number); 
AddCallProto(prototypc); 

AddCall Program (ProgramBefore, "OpenFile",pcArray); 
15 

AddCallProgram takes as an argument the location or 
place where the program is to be modified, for example, the 
beginning or the end. AddCallProgram inserts a procedure 
call to an analysis routine at the specified location. During 

20 execution, the call causes transfer of control to the analysis 
routine so that performance data can be collected. 

AddCallProc is similar at the procedure level. The seman- 
tics of modifying the program before and after procedures 
and basic blocks are maintained even if there are multiple 

25 entry points and multiple exit points. For example, if a 
procedure has multiple entry points, adding a call before the 
procedure will add the call for each entry point of the 
procedure, and will only call the analysis routine once, 
regardless of which entry point is selected during execution. 

30 The UIR 48 in cooperation with the fundamental instru- 
mentation routines are used to traverse the entire program. 
The UIR 48 can locate and identify specific portions of the 
program, e.g., procedures, blocks, and instructions, to be 
modified for monitoring. 

35 The UIR 48 can be created by the user as a high-level 
source code module. The UIR 48 when compiled and linked 
with the FIR 47 adapts the instrumentor 55 to perform 
specific monitoring task. 

By way of example, FIG. 6 shows a user instrumentation 

40 routine BRANCH 400 for locating all conditional branch 
instructions in the program. The branch instructions are 
found by tracing the execution flow through the static 
program and identifying only instructions of the branch 
type. Thus located branch instructions can be modified so 

45 that the branch predication rate can be determined. 

In steps 401 and 410 and, the procedures 101-103 of the 
program are located. Step 420 locates the blocks 105 of each 
procedure, and in step 430 the last instruction of each block 
is located. In step 440, a determination is made to see if the 

50 last instruction of the block is of the desired type, for 
example, conditional branch. If the instruction is of the 
desired type, the program is modified in step 450. 

Following is a summary of some exemplary user instru- 
mentation and analysis routines that can be used to monitor 

55 different operating characteristics of the program. These 
routines illustrate the adaptability of the monitoring system 
to collect and analyze different performance characteristics. 
For each monitoring opportunity there is an instrumentation 
and an analysis routine. 

60 CACHE 

The CACHE instrumentation and analysis routines can be 
used to simulate the execution of the program with, for 
example, an 8K direct mapped cache having 32 byte lines, 
and a read allocate policy. The instrumentation routine 
65 modifies the "load" and "store" type of instructions and 
passes the effective memory addresses referenced by these 
instructions to the analysis routine. Cache "hits" and 
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"misses" are simulated upon a reference, and cache tags are respectively. Beginning at a low memory address are stored 

updated to reflect the current state of the cache. the read-only user data 201. The read-only data 201 are data 

MALLOC which should not be modified during execution of the 

The MALLOC instrumentation routine modifies the pro- program. The read-only user data 201 are followed by the 

gram to collect statistics on dynamically allocated memory. 5 exception data 202. The exception data 202 are unantici- 

The routine identifies the portions of the program which P ated data generated by the CPU 2 due to unanticipated 

dynamically allocate memory and insert a call to the exception conditions encountered while executing the 

complementary analysis routine to create a histogram of program, for example, page fault, illegal operation, etc. In 

dynamic memory allocations. me instrumented code 60', the analysis text 203 which are 

TRACE 10 tne macnme dependent executable instructions of the user 

In order to trace the starting address of each executed anal y sis routines are ***** betweeQ the P ro S ram text 204 

basic block and the effective address of each memory and the exception data 202. The analysis text can be in the 

reference the TRACE routines can be used. form of a texl file as described for the Microsoft Portable 

DYNINST Executable (PE) file format described above. 

To dynamically count the number of times specific is Beginning at a high memory address, are the uninitialized 

procedures, basic blocks, and instructions, e.g., load and and initialized user data 208 and 209. For the instrumented 

store, are executed use the DYNINST routines. code 60 '» lhe uninitialized and initialized analysis data 207 

BLOCK ant * follow the program data 209-208. The uninitialized 

The BLOCK routines count the number of times identi- analysis data are converted to initialized analysis data by 

fied basic block of the program is executed. 20 xiUn & a11 values to ™ T0 ' 

PROFILE ^ ne instrumentation of the program by adding 

The PROFILE instrumentation routine modifies the pro- instructions, and the addition of the analysis routines 49 has 

gram to count the total number of instructions that are caused the program addresses to be relocated. However, it is 

executed in each of the procedures and to compute the P ossible t0 statically map the relocated address of program 

percentage of instructions executed. The PROFILE analysis 25 to the unmodified addresses. For example, if the analysis 

routine collects these performance data and prints an execu- routines 49 need the program counter (PC) of the program, 

tion profile of the program. the nin-time PC can be mapped to the original PC of the 

BRANCH uninstmmented code 60. 

Theseroutinesenablethemonitoringofbranchprediction Now continuing with reference to FIG. 4, the code 

rate for each "branch" instruction. The analysis routine 30 generator 57, using a hardware architecture selected from 

prints a histogram of the branch prediction rate for the entire the CPU description 19, can produce machine dependent 

program see FIG 6 executable code 60' from the instrumented procedures 100. 

DYQ ' Tne executable code 60' can be loaded into the memory 3 

Tne data translation buffer routines can be used to monitor of a tar S et hardware of choice for execution. During 

the efficiency of, for example, a 32 byte entry data transla- 35 execution, the instrumented code will call the analysis 

tion buffer using a "not most recently used" replacement routiQes - ^ routines collect 9B which m stored 

policy for 8K byte memory pages. m the memory 3. The data can be analyzed in real-time as 

CLASSES me program is executing, or the data can be analyzed by the 

The CLASSES instrumentation routine statically counts user me Program finishes execution, 

the number of instructions in the program for each instruc- 40 In conclusion, the monitoring system as described herein 

tion class provides a single framework for monitoring the performance 

INLINE characteristics of computer systems while executing pro- 

These routines may be used to identify procedures of the 6 rams - **** svstem handles the details of P ro S ram modlfi " 

program which may be placed in-line to improve execution cation while the user concentrates on what performance data 

performance. The output of the analysis routine includes 45 are to be collected, and how the performance data are to be 

procedure names, call addresses, and the number of times analyzed. The performance data are communicated to the 

each procedure is called analysis routines by simple fast procedure calls, reducing the 

When the procedures 100 are fully instrumented, the user monitoring overhead, 
analysis routines 49 are added and linked in such a way that ^ common structures of programs to be identified and 
the execution of the program 20 is unaffected. In order to 50 modified can be performed by fundamental instrumentation 
ensure accurate performance data, the analysis routines 49 procedures. The user supplies simple problem specific pro- 
are always presented with a program state representing the cedures for different monitoring tasks, 
uninstmmented or "pure" version of the program h ^ te a PP a rent to those skilled in the art that various 

The analysis routines 49 do not share any procedures or modifications can be made to the present mvention without 

data with the program 20 being monitored. If both the 55 departing from the spirit and scope of the invention as set out 

program 20 and the analysis routines 49 require the same m me appended claims, 

library procedure, for example, a procedure to print data, a ^ e ciajm: 

copy of the required library procedure is incorporated in the L A computer implemented method for modifying a 

program and a separate copy is incorporated in the analysis program, the program including first instructions to be 

routines 49 60 exec uted in a computer system having a memory, compris- 

During execution of the instrumented program, the instru- m S tne ste P s °f : 
mented executable code 60' and the analysis routines share processing the program in a compiler; 
the same address space of the memory 3. The user data 9A translating the first instructions of the program received 
of the program are not relocated so that the original data from the compiler into a program module of an inter- 
addresses can be maintained. 65 mediate language; 

FIGS. 7 and 8 show the allocation of memory storage modifying the program module by including at least one 

space for the program before and after instrumentation, additional instruction to transfer control to a user 
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analysis routine to monitor performance in the program 
module, wherein the user analysis routine does not 
share procedures or data with the program module; and 
generating from the modified program module a modified 
program having second instructions. 5 

2. The method of claim 1 further comprising the step of: 
partitioning the program module into a plurality of basic 

components, the basic components including proce- 
dures of the program, and the procedures including 
basic blocks. 10 

3. The method of claim 1 further comprising the step of: 
constructing, in a memory of the computer system, a flow 

graph for each procedure and a program call graph, the 
procedure flow graphs indication the execution flow 
within each procedure, the program call graph to indi- 
cate the execution flow among the procedures. 

4. The method of claim 3 further comprising the step of: 
tracing an execution flow through the plurality of basic 

program components using the procedure flow graphs 2 o 
and the program call graph to identify a particular basic 
component to modify. 

5. The method of claim 4 further comprising the step of: 
wherein the additional instruction is a call instruction to 

the user analysis routine, the call instruction being 25 
executed when the specific program component is 
executed in the computer system. 

6. The method of claim 1 further comprising the step of: 
combining the user analysis routine with the modified 

program. 30 

7. The method of claim 1 wherein a first hardware 
architecture is associated with the first machine dependent 
executable instructions, and a second hardware architecture 
is associated with the second machine executable instruc- 
tions. 35 

8. A computer implemented method for modifying a 
program module in the form of instructions of an interme- 
diate machine readable language produced by a translator 
into machine dependent executable instructions to be carried 
out in a computer system comprising the steps of: 40 

modifying the program module by partitioning the pro- 
gram module into basic program components, and 
including at least one additional instruction to transfer 
control to a user analysis routine to monitor perfor- 
mance in at least one of the basic program components, 45 
wherein the user analysis routine does not share pro- 
cedures or data with the program module; and 

generating machine dependent executable instructions 
from the modified program module. 5Q 

9. The method of claim 8 wherein the step of modifying 
the program module includes partitioning the program mod- 
ules into the basic program components and tracing an 
execution flow through such program components to deduce 
structural and execution flow information. 5S 

10. The method of claim 9 wherein the step of tracing is 
accomplished using navigational routines. 

11. The method of claim 9 wherein the step of modifying 
the program module further includes locating a program 
component to be monitored during tracing using the deduced 60 
structural information. 

12. The method of claim 11 wherein the step of modifying 
the program module further includes inserting a call instruc- 
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tion in the located program to be monitored, the call instruc- 
tion to transfer execution control to a user analysis routine 
to monitor the program component to be monitored during 
execution of such program component in the computer 
system. 

13. A computer implemented method for modifying a 
program module produced by a translator in the form an 
intermediate machine readable language having instructions 
independent of computer hardware architecture into 
machine dependent executable instructions to be carried out 
in a computer system comprising: 

partitioning the linked code module into basic program 
components; 

tracing an execution flow through the basic program 

components using navigational routines to deduce 

structural and flow information; 
locating program components to be monitored during the 

tracing step using the deduced structural information; 

and 

inserting a call instruction in the located program com- 
ponent to be monitored, the call instruction for trans- 
ferring execution control to a user analysis routine to 
monitor the program component to be monitored dur- 
ing execution of the program component in the com- 
puter system, wherein the user analysis routine does not 
share procedures or data with the program module. 

14. A method of modifying a computer program formed 
of machine dependent executable instructions, comprising 
the steps of: 

translating the machine dependent executable instructions 
into a program module in an intermediate language; 

partitioning the program module into a plurality of basic 
program components; 

tracing an execution flow through the plurality of basic 
program components using navigational routines; 

while tracing the execution flow through the plurality of 
basic program components, modifying the plurality of 
basic program components by inserting a call instruc- 
tion to transfer execution control to a user analysis 
routine when a specific program component is executed 
in the computer system, wherein the user analysis 
routine does not share procedures or data with the 
program module; and 

generating instrumented machine dependent executable 
code from the modified plurality of basic program 
components after inserting the call instruction. 

15. A method of modifying a computer program formed 
of machine dependent executable code, comprising the steps 
of: 

translating the machine dependent executable code into 
machine independent executable code; 

inserting a call instruction referencing a user analysis 
routine into the machine independent executable code, 
wherein the user analysis routine does not share pro- 
cedures or data with the program module; and 

generating modified machine dependent executable code 
from the machine independent executable code which 
includes the call instruction. 

***** 
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